AITopics

2406.19403

Country:

Europe > United Kingdom (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Italy (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Pacheco, Armando J. Cabrera, Derr, Rabanus, Williamson, Robert C.

An Axiomatic Approach to Loss Aggregation and an Adapted Aggregating Algorithm

arXiv.org Artificial IntelligenceJun-4-2024

Supervised learning has gone beyond the expected risk minimization framework. Central to most of these developments is the introduction of more general aggregation functions for losses incurred by the learner. In this paper, we turn towards online learning under expert advice. Via easily justified assumptions we characterize a set of reasonable loss aggregation functions as quasi-sums. Based upon this insight, we suggest a variant of the Aggregating Algorithm tailored to these more general aggregation functions. This variant inherits most of the nice theoretical properties of the AA, such as recovery of Bayes' updating and a time-independent bound on quasi-sum regret. Finally, we argue that generalized aggregations express the attitude of the learner towards losses.

aggregating algorithm, aggregation, axiomatic loss aggregation, (12 more...)

2406.02292

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > Germany > Brandenburg > Potsdam (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neural Information Processing SystemsApr-6-2023, 17:47:43 GMT

Competitive On-line Linear Regression

aggregating algorithm, competitive on-line linear regression, procedure, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Duchi, John, Marsden, Annie, Valiant, Gregory

On Misspecification in Prediction Problems and Robustness via Improper Learning

arXiv.org Machine LearningJan-13-2021

Suppose we seek a probability distribution p(y x) modeling outcomes y given data x. The typical approach is to choose a parametric family of probability distributions, then find the "best" member of this family according to a given loss. It is rarely realistic to assume that the parametric family is well-specified, and thus it is important to understand the consequences of misspecification and how to circumvent these downsides. To address these challenges, in this paper we derive a new measure of a problem's robustness to misspecification that relies on the curvature of the loss at hand and putative parametric family, proving that this measure lower bounds convergence rates for prediction error and certifies the failure of a parametric family and loss to be robust (or achieve optimal convergence rates for prediction). To complement this new family of lower bounds for probabilistic prediction problems, we build out of earlier work on improper learning [40, 14]--when we may choose predictions p(y x) outside the given model family--to show how it is possible to be robust to such misspecification, and moreover, we give new optimality guarantees for such improper procedures. Formalizing our setting, we consider the following probabilistic game: a player receives a covariate vector x X, plays a distribution p(· x) on a target set Y, then receives y Y and suffers loss L(p(· x), y). We study both a sequential and a stochastic variant of this problem.

algorithm, exp, misspecification, (17 more...)

2101.05234

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Kamalaruban, Parameswaran, Williamson, Robert C., Zhang, Xinhua

Exp-Concavity of Proper Composite Losses

arXiv.org Machine LearningMay-20-2018

The goal of online prediction with expert advice is to find a decision strategy which will perform almost as well as the best expert in a given pool of experts, on any sequence of outcomes. This problem has been widely studied and $O(\sqrt{T})$ and $O(\log{T})$ regret bounds can be achieved for convex losses (\cite{zinkevich2003online}) and strictly convex losses with bounded first and second derivatives (\cite{hazan2007logarithmic}) respectively. In special cases like the Aggregating Algorithm (\cite{vovk1995game}) with mixable losses and the Weighted Average Algorithm (\cite{kivinen1999averaging}) with exp-concave losses, it is possible to achieve $O(1)$ regret bounds. \cite{van2012exp} has argued that mixability and exp-concavity are roughly equivalent under certain conditions. Thus by understanding the underlying relationship between these two notions we can gain the best of both algorithms (strong theoretical performance guarantees of the Aggregating Algorithm and the computational efficiency of the Weighted Average Algorithm). In this paper we provide a complete characterization of the exp-concavity of any proper composite loss. Using this characterization and the mixability condition of proper losses (\cite{van2012mixability}), we show that it is possible to transform (re-parameterize) any $\beta$-mixable binary proper loss into a $\beta$-exp-concave composite loss with the same $\beta$. In the multi-class case, we propose an approximation approach for this transformation.

artificial intelligence, machine learning, substitution function, (16 more...)

1805.07737

Country: Oceania > Australia (0.28)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Foster, Dylan J., Kale, Satyen, Luo, Haipeng, Mohri, Mehryar, Sridharan, Karthik

Logistic Regression: The Importance of Being Improper

arXiv.org Machine LearningMar-25-2018

Learning linear predictors with the logistic loss---both in stochastic and online settings---is a fundamental task in learning and statistics, with direct connections to classification and boosting. Existing "fast rates" for this setting exhibit exponential dependence on the predictor norm, and Hazan et al. (2014) showed that this is unfortunately unimprovable. Starting with the simple observation that the logistic loss is 1-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm. This provides a positive resolution to a variant of the COLT 2012 open problem of McMahan and Streeter (2012) when improper learning is allowed. This improvement is obtained both in the online setting and, with some extra work, in the batch statistical setting with high probability. We also show that the improved dependency on predictor norm is also near-optimal. Leveraging this improved dependency on the predictor norm yields the following applications: (a) we give algorithms for online bandit multiclass learning with the logistic loss with an $\tilde{O}(\sqrt{n})$ relative mistake bound across essentially all parameter ranges, thus providing a solution to the COLT 2009 open problem of Abernethy and Rakhlin (2009), and (b) we give an adaptive algorithm for online multiclass boosting with optimal sample complexity, thus partially resolving an open problem of Beygelzimer et al. (2015) and Jung et al. (2017). Finally, we give information-theoretic bounds on the optimal rates for improper logistic regression with general function classes, thereby characterizing the extent to which our improvement for linear classes extends to other parameteric and even nonparametric settings.

algorithm, artificial intelligence, machine learning, (16 more...)

1803.09349

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.91)

Reid, Mark D., Frongillo, Rafael M., Williamson, Robert C.

Generalised Mixability, Constant Regret, and Bayesian Updating

arXiv.org Machine LearningMar-10-2014

The combination or aggregation of predictions is central to machine learning. Traditional Bayesian updating can be viewed as a particular way of aggregating information that takes account of prior information. Notions of "mixability" which play a central role in the setting of prediction with expert advice offer a more general way to aggregate, and which take account of the loss function used to evaluate predictions (how well they fit the data). As shown by Vovk [2001], his more general "aggregating algorithm" reduces to Bayesian updating when log loss is used. However, as we will show there is another design variable that to date has not been fully exploited. The aggregating algorithm makes use of a distance between the current distribution and a prior which serves as a regulariser. In particular the aggregating algorithm uses the KL-divergence. We consider the general setting of an arbitrary loss and an arbitrary regulariser (in the form of a Bregman divergence) and show that we recover the core technical 1 result of traditional mixability: if a loss is mixable in the generalised sense then there is a generalised aggregating algorithm which can be guaranteed to have constant regret.

algorithm, artificial intelligence, machine learning, (16 more...)

1403.2433

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.90)

arXiv.org Artificial IntelligenceDec-1-2009

Competing with Markov prediction strategies

Vovk, Vladimir

This paper belongs to the area of research known as universal pre diction of individual sequences (see [2] for a review): the predictor's goal is t o compete with a wide benchmark class of prediction strategies. In the previou s papers [15] and [14] we constructed prediction strategies competitive with the important classes of Markov and stationary, respectively, continuous pred iction strategies. In this paper we consider competing against possibly discontinuous s trategies. Our main results assert the existence of prediction strategies com petitive with the Markov strategies. This paper's idea of transition from continuous to general benchma rk classes was motivated by Skorokhod's topology for the space D of "c` adl` ag" functions, most of which are discontinuous. Skorokhod's idea was to allow small d eforma-tions not only along the vertical axis but also along the horizontal ax is when defining neighborhoods. Skorokhod's topology was metrized by Kolm ogorov so that it became a separable space ([1], Appendix III; [11], p. 913), w hich allows us to apply one of the numerous algorithms for prediction with exper t advice (Kalnishkan and Vyugin's Weak Aggregating Algorithm in this paper) to construct a universal algorithm. In Section 2 we give the main definitions and state our main results, Th eo-rems 1 and 2; their proofs are given in Sections 3 and 4, respectively .

artificial intelligence, machine learning, prediction strategy, (15 more...)

cs/0607136

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Zhdanov, Fedor, Vovk, Vladimir, Burford, Brian, Devetyarov, Dmitry, Nouretdinov, Ilia, Gammerman, Alex

Online prediction of ovarian cancer

arXiv.org Artificial IntelligenceApr-9-2009

In this paper we apply computer learning methods to diagnosing ovarian cancer using the level of the standard biomarker CA125 in conjunction with information provided by mass-spectrometry. We are working with a new data set collected over a period of 7 years. Using the level of CA125 and mass-spectrometry peaks, our algorithm gives probability predictions for the disease. To estimate classification accuracy we convert probability predictions into strict predictions. Our algorithm makes fewer errors than almost any linear combination of the CA125 level and one peak's intensity (taken on the log scale). To check the power of our algorithm we use it to test the hypothesis that CA125 and the peaks do not contain useful information for the prediction of the disease at a particular time before the diagnosis. Our algorithm produces $p$-values that are better than those produced by the algorithm that has been previously applied to this data set. Our conclusion is that the proposed algorithm is more reliable for prediction on new data.

artificial intelligence, machine learning, prediction, (16 more...)

0904.1579

Country:

North America > United States (0.28)
Europe (0.28)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Oncology > Ovarian Cancer (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.34)

Neural Information Processing SystemsDec-31-1998

Competitive On-line Linear Regression

Vovk, Volodya

We apply a general algorithm for merging prediction strategies (the Aggregating Algorithm) to the problem of linear regression with the square loss; our main assumption is that the response variable is bounded. It turns out that for this particular problem the Aggregating Algorithm resembles, but is slightly different from, the wellknown ridge estimation procedure. From general results about the Aggregating Algorithm we deduce a guaranteed bound on the difference between our algorithm's performance and the best, in some sense, linear regression function's performance. We show that the AA attains the optimal constant in our bound, whereas the constant attained by the ridge regression procedure in general can be 4 times worse. 1 INTRODUCTION The usual approach to regression problems is to assume that the data are generated by some stochastic mechanism and make some, typically very restrictive, assumptions about that stochastic mechanism. In recent years, however, a different approach to this kind of problems was developed (see, e.g., DeSantis et al. [2], Littlestone and Warmuth [7]): in our context, that approach sets the goal of finding an online algorithm that performs not much worse than the best regression function found off-line; in other words, it replaces the usual statistical analyses by the competitive analysis of online algorithms. DeSantis et al. [2] performed a competitive analysis of the Bayesian merging scheme for the log-loss prediction game; later Littlestone and Warmuth [7] and Vovk [10] introduced an online algorithm (called the Weighted Majority Algorithm by the Competitive Online Linear Regression 365 former authors) for the simple binary prediction game. These two algorithms (the Bayesian merging scheme and the Weighted Majority Algorithm) are special cases of the Aggregating Algorithm (AA) proposed in [9, 11]. The AA is a member of a wide family of algorithms called "multiplicative weight" or "exponential weight" algorithms. Closer to the topic of this paper, Cesa-Bianchi et al. [1) performed a competitive analysis, under the square loss, of the standard Gradient Descent Algorithm and Kivinen and Warmuth [6] complemented it by a competitive analysis of a modification of the Gradient Descent, which they call the Exponentiated Gradient Algorithm.

algorithm, learner, ridge regression procedure, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)